Walktober

  • Our department had a walkathon in october where we all competed to see how many steps we could walk each day

Data quality angel

  • Since this is a stats I figured we would be data quality angels, so before the competition started, I searched up the most accurate pedometer

Pedometer = bad

  • It turns out, pedometers are wildly inaccurate

Make the most of it

  • So I picked the most accurate and inexpensive measuring device based on the small literature review I did

Data quality demons

  • It turns out I was the ONLY person concerned about data quality
  • We conducted a survey after walktober, to see if we could quantify the inaccuracy in the data
  • Turns out the measurement error was the least of my worries

Survey response

Quantifiable vs unquantifiable uncertainty

  • I can incorporate pedometer error estimates into our analysis, I CANNOT work with completely falsified data
  • This is the difference between quantifiable vs unquantifiable uncertainty
  • We are going to try and quantify the uncertainty that we can quantify
    • “Anything worth doing is worth doing poorly” - G. K. Chesterton
  • Toss the falsified stuff and try to quantify the pedometer measurement error stuff

Not an uncommon scenario

Often our data is….

  • Unavailable,
    • e.g. anonymised data, measurement error, etc.
  • Non-deterministic
    • e.g. bounded data, estimated values, etc.
  • or Theoretical
    • e.g. estimates based on theory, latent variables, etc

How many statisticans does it take to visualise a random variable

  • Even though we usually work with random variables, are unable to visualise them effectively
  • Our choice of error distribution might change the conclusion of our analysis in unexpected ways
  • Often our solution is to just ignore the inherrent uncertainty in our data

The visualisation challenge

  • Our department decides to do a visualisation challenge of the walktober data

I dont want to ignore it

  • But I did all that reading about pedometers, so I would like to incorporate that uncertainty

Spot the difference

  • Maps of temperature in Iowa counties
  • I chose two error distributions, can you spot the difference?

Exceedance probability map

  • If you care about the uncertainty, visualise the uncertainty
  • Why don’t we just visualise the probability or variance by itself?

A terrible vet

A terrible vet

Uncertainty as signal vs noise

  • Uncertainty can play two roles in an analysis
    • Sometimes it is used to hedge or dampen our conclusions on other statistics
    • Sometimes it is a statistic of inference itself
  • A visualisation is a statistic which means, just like other statistics, we use them to draw inference
    • If we want to draw inference on uncertainty: visualise uncertainty as signal
    • If it is supposed to hedge our inference from the plot: it is noise
  • An exceedence probability map is fine if we want to draw inference on our uncertainty, but not fine if we were trying to hedge the original plot

Solution: add an axis for uncertainty

  • At least we are trying to incorporate it into the original plot this time
  • 2D palette is harder to read
    • Colour is not a simple 3D space
    • Using saturation hurts accessibility
  • Doesn’t affect the visibility of the plot

I keep getting scammed

Aside: why doesn’t this work?

  • Uncertainty is not just another variable…
    • It presents an interesting perceptual problem
  • Usually do not want variables to interfere with each other
    • In uncertainty visualisation, the opposite is true

Uncertainty visualisation for signal supression

  • Statistical validity translates to perceptual ease
    • The higher the variance on an estimate, the harder that estimate is to extract from the plot

Solution: blend the colours together!

  • Made signal harder to see… but maybe too hard?
  • Still have 2D Colour palette
  • Standard error at which to blend colours is made up
    • Blend at 1? 2? 4? 37?
    • Impossible to align with hypothesis testing

Free yourself from the two variable approach

  • Realistically, we are trying add information back in that we just shouldn’t have droppped
  • We need a more holistic apporach that doesn’t allow us to pick and choose when and how we include uncertainty
  • Uncertainty visualisation doesn’t have units of data, it has units of “random variables” so we should directly input random variables

Visualise random variables with distributions

steps_dist team name
N(23679, 4633687)[0,Inf] iwalk() A
N(18322, 2774223)[0,Inf] iwalk() A
N(24562, 5e+06)[0,Inf] iwalk() A
N(26128, 5642050)[0,Inf] iwalk() A
N(10238, 866202)[0,Inf] iwalk() A
  • It turns out you can.
  • These columns are made using distributional

Solution: simulate a sample

  • Made using Vizumap’s pixelmap function
  • Not actually making any top level decisions, just letting the variance from the uncertainty carry through
  • The signal seems harder to read
  • 1D colour palette

But lets take this one step further…

Universal application in ggdibbler

  • ggdibbler applies this concept to every plot and every aesthetic

Universal application in ggdibbler

Text plot

Spatial pixel map

Bar charts

Raster plots

Contour plots

ggdibbler also ensures your plots have nice statistical properties

  • Statistical properties are what differentiate us from the animals

Visual Continuous mapping theorem

Example in geom_tile

Nested Positions

  • How we guarentee these properties are always held in ggdibbler

Back to walktober example

Future Plans

  • Future of the software
    • multivariate distributions and other complex more complex joint distributions
    • built out nested position system
    • expand on the scales to accept more object types
  • Unemployment
    • I also need a job (I am holding my software hostage)
    • If you want to give me a job, my email is harriet.m.mason@gmail.com

Acknowledgements

  • My Supervisors: Di Cook, Susan Vanderplas, and Sarah Goodwin
  • AEMO Zema Energy Schoalarship
  • Australian RTP Stipend
  • Numbat Hackathon (for the walktober data)
  • Mitch O’Hara-Wild and Cynthia Huang